Efficiently Evaluating Skyline Queries on RDF Databases
نویسندگان
چکیده
Skyline queries are a class of preference queries that compute the pareto-optimal tuples from a set of tuples and are valuable for multi-criteria decision making scenarios. While this problem has received significant attention in the context of single relational table, skyline queries over joins of multiple tables that are typical of storage models for RDF data has received much less attention. A naïve approach such as a join-first-skyline-later strategy splits the join and skyline computation phases which limit opportunities for optimization. Other existing techniques for multi-relational skyline queries assume storage and indexing techniques that are not typically used with RDF which would require a preprocessing step for data transformation. In this paper, we present an approach for optimizing skyline queries over RDF data stored using a vertically partitioned schema model. It is based on the concept of a “Header Point” which maintains a concise summary of the already visited regions of the data space. This summary allows some fraction of non-skyline tuples to be pruned from advancing to the skyline processing phase, thus reducing the overall cost of expensive dominance checks required in the skyline phase. We further present more aggressive pruning rules that result in the computation of near-complete skylines in significantly less time than the complete algorithm. A comprehensive performance evaluation of different algorithms is presented using datasets with different types of data distributions generated by a benchmark data generator.
منابع مشابه
Emrooz: A Scalable Database for SSN Observations
The design of ontologies for sensor data and metadata has received considerable attention. The most prominent is arguably the Semantic Sensor Network (SSN) ontology. For persistence and retrieval of sensor observations, systems that adopt the SSN ontology most obviously build on an RDF database (triple store). However, large volumes of collected sensor data can be challenging for RDF databases,...
متن کاملComputing Continuous Skyline Queries without Discriminating between Static and Dynamic Attributes
Although most of the existing skyline queries algorithms focused basically on querying static points through static databases; with the expanding number of sensors, wireless communications and mobile applications, the demand for continuous skyline queries has increased. Unlike traditional skyline queries which only consider static attributes, continuous skyline queries include dynamic attribute...
متن کاملProgressive skylining over Web-accessible databases
Skyline queries return a set of interesting data points that are not dominated on all dimensions by any other point. Most of the existing algorithms focus on skyline computation in centralized databases, and some of them can progressively return skyline points upon identification rather than all in a batch. Processing skyline queries over the Web is a more challenging task because in many Web a...
متن کاملSPARQL over GraphX
The ability of the RDF data model to link data from heterogeneous domains has led to an explosive growth of RDF data. So, evaluating SPARQL queries over large RDF data has been crucial for the semantic web community. However, due to the graph nature of RDF data, evaluating SPARQL queries in relational databases and common data-parallel systems needs a lot of joins and is inefficient. On the oth...
متن کاملThe Spatial Nearest Neighbor Skyline Queries
User preference queries are very important in spatial databases. With the help of these queries, one can found best location among points saved in database. In many situation users evaluate quality of a location with its distance from its nearest neighbor among a special set of points. There has been less attention about evaluating a location with its distance to nearest neighbors in spatial us...
متن کامل